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Abstract 

We propose a method for super-resolution of range image. Our approach leverages the interpretation of LR image 
as sparse samples on the HR grid. Based on this interpretation, we demonstrate that our recently reported approach 
which reconstructs dense range images from sparse range data by exploiting a registered colour image, can be applied 
for the task of resolution enhancement of range images. Our method only uses a single colour image in addition to 
the range observation in the super-resolution process. Using the proposed approach, we demonstrate super-resolution 
results for large factors (e.g. 4) with good localization accuracy. 

1 Introduction 

Within the last decade range cameras and scanners are established as an important dominant acquisition modality in 
computer vision and multimedia community. Applications where range cameras are successfully employed include 
digital heritage [1], industrial inspection, filming and 3DTV [2J, gaming |3| etc. Range cameras operate on a va- 
riety of technologies involving laser scanning |l4l[l3], time-of-flight imaging (Ellll and structured or coded lighting 
lfT2]| which determine aspects such as accuracy, resolution, speed of acquisition and cost. For instance, there exists 
laser-based and structured-light-based scanners provide accurate and high resolution (HR) range scans, but the acqui- 
sition is typically slow and requires a good amount of manual effort. On the other hand, the low-cost laser scanners, 
time-of-flight scanners and light-coding range scanners (e.g. Kinect f3\) acquire range images much faster and with 
little manual intervention, but are limited in accuracy and resolution. Such a trade-off motivates the development of 
computational approaches to enhance the resolution and accuracy, so as to make range scanning more affordable, less 
time-consuming. 

The low spatial resolution is a common issue with low-cost range scanning. Often, the acquired range image reso- 
lution is of the order of 100 to 200 pixels (in one direction) |7 1. As compared to digital cameras (which typically span 
megapixels), these acquisition grids are too small to capture sufficient scene detail, edges etc. Hence, the resolution 
needs to enhanced by large factors (e.g. 4, 6 etc. ). Naive approaches of enhancing the resolution by image interpo- 
lation methods results in heavy loss of accuracy. As a result, recent years has seen the rise of various sophisticated 
approaches ||4]|5][6l to enhance the resolution of range images which involve preserving edges as well as maintaining 
good accuracy. 

A key to enhance the spatial resolution of range images by large factors, is to utilize a high-resolution registered 
colour image of the same scene, as done in these approaches. The colour image can be easily captured using any 
off-the-shelf digital camera. The use of an HR colour image is motivated by an observation that commonly, the 
prominent depth discontinuities (such as those between different objects in a scene) coincide with colour disconti- 
nuities |14|. Thus, information about the presence/absence of colour discontinuities and local similarity of colour 
information, which is derived from the HR colour image helps in localizing range discontinuities and estimating dense 
range information on the HR grid. Such a cue from the colour image has also been used in many stereo disparity 
estimation approaches. 

In this work, we notice that the problem of range super-resolution (SR) can, in fact, be interpreted as that of 
reconstructing a range image from sparsely captured data. In this context, we recently reported an approach from 

' The registration can be achieved via one-time pre-calibration of the digital and range cameras or using any of the well established landmark- 
based registration approaches. Similar to the above referred approaches k4j|5j, we too assume that the range and colour images are pre-registered. 
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Figure 1 : (a) LR range image and, (b) its corresponding sparse HR interpretation. The HR grid is 4 x the LR grid in 
each direction. 

reconstruction of dense range maps from sparsely captured range data Q, which uses a similar cue from the regis- 
tered colour image via its segmentation. The advantages of this approach include a relatively simple local estimation 
approach, good accuracy, and computational efficiency. Here, we demonstrate that this approach would also hold for 
the task of range super-resolution. 

The paper is organized as follows. In the next subsection we discuss some related work. Section 2 covers our 
methodology, which includes a brief description of the basic approach in I^TJ. We then provide some results in section 
3, and conclude in section 4. 

1.1 Related work 

As mentioned earlier, the idea of using a colour image has been explored for range super-resolution in various ways. 
For example, the work in 1 16] interpolates the range image and, by exploiting the assumption that depth discontinuities 
coincide with color edges, improves estimation at discontinuities. Similar improvements are shown in MRF-based 
energy minimization approaches that also uses a HR optical image ifTSl flTl. However, these approaches are not known 
to perform for large super-resolution factors. 

The authors in |4| propose an application of bilateral filtering which exploits constraints from the HR colour image. 
This work demonstrates ability to achieve good quality super-resolution by large factors. However, the bilateral filter, 
which is based on the HR colour image, has to be defined at each pixel. This makes the approach computationally 
very demanding. Our approach which works on image segments rather than pixels, and is based on computing local 
costs over segments, is a lot more efficient. 

Another recent approach which also employs a colour image is reported in 15]. This method also uses the seg- 
mentation of the colour image. However, beyond this, it significantly differs from our method. The range labels are 
computed using a modified version of bilinear interpolation which is directed by the colour segmentation. On the other 
hand, ours approach computes range labels based on minimization of local costs with regularization constraints. 

More recently, an example based range super-resolution approach is also reported |6 1, which also performs well at 
large resolution factors but requires a separate range dataset from which the training examples are derived. In contrast, 
our approach, like the above mentioned methods, only requires a registered colour image. 

2 Resolution enhancement as reconstruction of sparse range data 

As mentioned earlier, our approach is based upon the idea of treating the super-resolution problem as that of range 
reconstruction from sparse data Q. This is inspired by the idea that LR range images can be essentially interpreted 
as down-sampled versions of their HR counterparts, where the down-sampling is obtained by selecting range pixels at 
regular intervals (depending on the SR factor). That is, the LR image is modeled as a decimation of the HR image. 
While in most image super-resolution approaches, the down-sampling is modeled by averaging of the HR pixels, the 
decimation model for down-sampling is not uncommon for range super-resolution f9"8l. Thus, observing an LR image 
can be interpreted as observing only those samples of the HR image which were selected during the down-sampling 
process, with spaces between these samples having a zero value on the HR grid, thus resulting in a sparse HR image. 
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An example of the sparse interpretation of an LR range image is shown in Fig. [T| where for a resolution enhancement 
with a factor of 4, close to 93% of the data missing. Such large magnitudes of missing data are also considered in |7|. 
Hence, the SR task is essentially that of reconstructing such a sparse HR image. 

2.1 Proposed method 

As mentioned we begin by representing the low resolution range image on the high-resolution image grid by uniformly 
spacing the pixels in the low resolution range on the HR grid. The associated colour image is of the same size of this 
HR image grid. 

The method starts with segmentation of the registered colour image. The colour segmentation is carried out using 
the well-known mean-shift algorithm (MSA) fTOl. While a detailed description of the MSA is beyond the scope of 
this paper, it is worthwhile to state that the MSA uses two kernels defined on the colour dimension and the spatial 
dimension to determine the coarseness of the final colour segmentation. The greater these bandwidths, the coarser 
the segmentation and bigger the segments. These two parameters are important in the context of the approach in |7 1. 
At the beginning, the method starts with small bandwidths and as a result achieves a largely over-segmented image, 
where each object is divided into multiple segments. 

Following the initial colour segmentation, the approach computes plane-fit for each segment, based on the available 
range pixels in the segment. However, this is done only if the number of visible range pixels (Ua) that segment exceeds 
a threshold Upi {Ua > Upi). The plane-fitting is carried out using the RANSAC method lITTI . Given the fitted-plane 
over a segment, a local cost is defined for assigning a range label z for each invisible pixel p in the segment as 

Cp^\z~ Zpl\+Xp^\Z- Zg\ (1) 

qeVp 

Here, Zpi is the plane-fitted range at p and Vp is the set of visible second-order neighbours of p that belong to segment 
s. The first term computes distance from the plane-fitted range. The second term, weighted by Ap, enforces similarity 
between neighbours. The label estimated for the pixel p is simply the one which minimizes Cp. 

If for a segment < Ua < Upi, then plane-fitting may not be robust enough. For such segments, a median range 
z^ over the Ua pixels is computed. In addition, the medians of the visible pixels over the adjacent segments are also 
computed. Based on this, a cost is defined as 

Cs = \z- Zm\+Wa ^ |z-Z„J (2) 

where iria is the set of the medians z„i^ of the visible pixels of a G A^. In equation |2j the second term enforces 
similarity over neighbouring segments. The weight Wa is defined for a pair of adjacent segments s and a as follows 



{\rs - ra\ + \gs - ga\ + \bs - 6a|) 

Here, rl, gl, hi are the mean RGB intensities for segment i. This contextual weighting strengthens the smoothness 
between similarly coloured segments but weakens it for segments with large colour differences. The label minimizing 
Cs is assigned to all the missing pixels in the segment s. Recall that this assignment is only for segments for which 
{) < Ua < Upi . These are typically small segments for whom a constant surface assumption is quite valid. 

As one would have observed, the above process only labels the pixels in the segments for which Ua > 0. However, 
there will be many segments with Ua = 0, when large number of pixels are missing. Just one pass of the above process, 
with a constant colour segmentation will leave many pixels unlabeled. To resolve this issue, the mean-shift segmenta- 
tion, in each iteration, is performed with different (slightly larger) kernel bandwidths, so that as iterations progress, the 
segmentation produces somewhat larger segments. This process essentially subsumes many of the currently labeled 
and unlabeled segments into common larger segments. Thus, in subsequent iterations, many of the currently unla- 
beled segments are no longer isolated, thus making their pixels eligible for labeling. Since similar segments are more 
likely to be grouped, the demarcation between prominent range discontinuities that corresponds to largely different 
segments, is still maintained. Thus, the segment expansion process allows the cost computation modules to label all 
the pixels over iterations while maintaining prominent range discontinuities. A complete labeling is typically achieved 
in 4-5 iterations. 
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Figure 2: (a,b,c) Colour images from the Middlebury dataset used in our experiments. (d,e,f) Corresponding low 
resolution range images 



3 Results 

We now provide some results for our range super-resolution method described in the previous section. We conducted 
experiments with range images from the Middlebury dataset (TT], which also has colour images associated with the 
range data. Some examples of the colour images corresponding to the scenes used in our experiments are shown in 
Figs. |2|a,b,c). The original resolution of the range images is of the order of 400 x 400 pixels. We down-sample these 
images and use the down-sampled versions as LR range images. We then compare our SR results with the original 
range images (or ground truth). 

The results for the case of SR by a factor of 4 (Fig. |3]l. The low-resolution images are shown in Figs. |2|d,e,f). One 
can observe that it is difficult to distinctly perceive the image content in the LR images. Note that for a factor of 4, the 

3la,d,g)). Hence, the object shapes clearly lack proper 
b,e,h)) shows clear improvements over the interpolated 



bicubic -interpolated image is very blurred at the edges (Figs, 
localization. In comparison, our super-resolution result (Fig. [3 
image. Such an improvement in the localization is the essence of range super-resolution. Moreover, in addition, one 
can also notice that high-level of fidelity when the super-resolved range images are compared with the ground-truth 
(Fig. |3jc,f,i)). As mentioned earlier, the factor of 4 case has a total of 93% missing data in its sparse interpretation. 
Inspite of this, our approach is able to maintain localization and fidelity in the eventual result. 

Lastly, for completeness, we reassert that our super-resolution approach takes less than a couple of minutes for 
each of the above cases, on a Xeon 3.2 GHZ CPU with 12 GB RAM, with a Matlab implementation. Thus, we believe 
that our approach also performs well from the point of view computational efficiency. The efficiency of our method 
can be attributed to its local nature (i.e. there is no global energy minimization), as well as its segment-based (and not 
an explicit pixel-based) processing. 



4 Conclusion 

In this work, we investigated the problem of range super-resolution from the point of view of reconstructing dense 
range maps from sparse data. Based on the realization of LR images as sparse samples at the HR grid, we employed 
a recently proposed method for sparse range reconstruction, and demonstrated is applicability for the task of range 
super-resolution. Our approach shows promising results even for large resolution factors such as 4. In future, it 
would be interesting to gauge the performance of the approach and investigate further improvements under noise and 
non-exact registration between the range and colour image. 
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Figure 3: SR by a factor of 4: (a,d,g) Bicubic interpolation. (b,e,h) Super-resolved range image using our approach. 
(c,f,i) Ground-truth range. 
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